A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
نویسندگان
چکیده
It is a fundamental and important task to extract key phrases from documents. Generally, phrases in a document are not independent in delivering the content of the document. In order to capture and make better use of their relationships in key phrase extraction, we suggest exploring the Wikipedia knowledge to model a document as a semantic network, where both n-ary and binary relationships among phrases are formulated. Based on a commonly accepted assumption that the title of a document is always elaborated to reflect the content of a document and consequently key phrases tend to have close semantics to the title, we propose a novel semi-supervised key phrase extraction approach in this paper by computing the phrase importance in the semantic network, through which the influence of title phrases is propagated to the other phrases iteratively. Experimental results demonstrate the remarkable performance of this approach.
منابع مشابه
روش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملExploiting Description Knowledge for Keyphrase Extraction
Keyphrase extraction is essential for many IR and NLP tasks. Existing methods usually use the phrases of the document separately without distinguishing the potential semantic correlations among them, or other statistical features from knowledge bases such as WordNet and Wikipedia. However, the mutual semantic information between phrases is also important, and exploiting their correlations may p...
متن کاملNeural Models for Key Phrase Detection and Question Generation
We propose a two-stage neural model to tackle question generation from documents. Our model first estimates the probability that word sequences in a document compose “interesting” answers using a neural model trained on a questionanswering corpus. We thus take a data-driven approach to interestingness. Predicted key phrases then act as target answers that condition a sequence-tosequence questio...
متن کاملScalable Semantic Parsing with Partial Ontologies
We consider the problem of building scalable semantic parsers for Freebase, and present a new approach for learning to do partial analyses that ground as much of the input text as possible without requiring that all content words be mapped to Freebase concepts. We study this problem on two newly introduced large-scale noun phrase datasets, and present a new semantic parsing model and semi-super...
متن کاملLinguistic Knowledge Based Supervised Key - phrase Extraction
The most important information about the content of a document is represented by the key phrases of that document. In this study an automatic key phrase extraction algorithm is devised using machine learning technique. The proposed method not only considers the document level statistics like TFxIDF, the linguistic features of the phrases are also incorporated. Experiment has been performed on N...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010